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ABSTRACT 

We present BayeSED, a general purpose tool for doing Bayesian analysis of SEDs by using whatever 
pre-existing model SED libraries or their linear combinations. The artificial neural networks (ANNs), 
principal component analysis (PGA) and multimodal nested sampling (MultiNest) techniques are em- 
ployed to allow a highly efficient sampling of posterior distribution and the calculation of Bayesian 
evidence. As a demonstration, we apply this tool to a sample of hyperluminous infrared galaxies 
(HLIRGs). The Bayesian evidences obtained for a pure Starburst, a pure AGN, and a linear com- 
bination of Starburst-|-AGN models show that the Starburst -1- AGN model have the highest evidence 
for all galaxies in this sample. The Bayesian evidences for the three models and the estimated con- 
tributions of starburst and AGN to infrared luminosity show that HLIRGs can be classified into two 
groups: one dominated by starburst and the other dominated by AGN. Other parameters and cor- 
responding uncertainties about starburst and AGN are also estimated by using the model with the 
highest Bayesian evidence. We found that the starburst region of the HLIRGs dominated by starburst 
tends to be more compact and has a higher fraction of OB star than that of HLIRGs dominated by 
AGN. Meanwhile, the AGN torus of the HLIRGs dominated by AGN tend to be more dusty than 
that of HLIRGs dominated by starburst. These results are consistent with previous researches, but 
need to be tested further with larger samples. Overall, we believe that BayeSED could be a reliable 
and efficient tool for exploring the nature of complex systems such as dust-obscured starburst- AGN 
composite systems from decoding their SEDs. 

Subject headings: galaxies: active - galaxies: evolution- galaxies: ISM - galaxies: starburst - methods: 
data analysis - methods: statistical 



L INTRODUCTION 

The formation and evolution of galaxies and super- 
massive black holes (SMBHs) are now believed to 
be tightly related. Meanwhile, violent formation of 
stars (starburst) and growth of SMBHs (AGN) are 
found to be coupled and ongoing together in the most 
infrared-luminous galaxies, such as ultraluminous in- 
frared galaxies (ULIRGs) and hyperluminous infrared 
galaxies (HLIRGs). These galaxies represent important 
phases in the formation and evolution of galaxies, and 
ideal laboratories for studying the starburst-AGN con- 
nections. Since both of star formation and SMBHs ac- 
cretion are taking place, while a large amount of dust 
is distributed throughout, this kind of galaxies are very 
complex dust-obscured starburst-AGN composite sys- 
tems. The spectral energy distributions (SEDs) are the 
primary source of information for our understanding of 
them. However, currently it is still very challenging to 
efficiently extract the basic physical properties of these 
galaxies from the analysis of their SEDs. 

The analysis of SED, or SED fitting, tries to ex- 
tract one or several physical properties of a galaxy from 
fitting models to the observed SED. Nowadays, new 
observing facilities and large surveys allow us to ob- 
tain the full SEDs at wavelengths from the X-rays to 
the radio for galaxies extending from local to redshifts 



higher than 6. On the other hand, as the starting 
point of SED fitting, a library of model SEDs needs to 
be built in advance. For most galaxies, stars are the 
main sources of lights. The evo lutionary population syn- 
thesis (EPS) models (iTinslevI [19 72: Scarle ct al. 1973 
TinslevifTOTl iL arson fc TinslevlTl978 : Bruzual A. 1982 

Leitherer et al. 199£ 
2005; Z hang et aLl 
which are 
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Bruzual & Chariot 2003; 'Marastori 
2005a, b; Han et al. 2007; Conroy et al. 2009) 
based on the knowledge of stellar evolution such as the 
assumed stellar initial mass function (IMF), star forma- 
tion history (SFH), stellar evolutionary tracks, and stel- 
lar libraries, are standard tools for modeling the SEDs of 
galaxies. 

Meanwhile, the dusty interstellar medium (ISM), if 
presented, have important effects on the resulting SEDs 
of galaxies. A fraction, or most in extreme cases such as 
ULIRGs and HLIRGs, of the initial radiations from stars 
are absorbed and reprocessed by the gas and dust in the 
ISM. Gases heated by young stars produce luminous neb- 
ular emission lines, while dusts heated by stars of all ages 
produce the mid-infrared (MIR) and far-infrared (FIR) 
emission. A simple method is to handle the absorption 
of star lights and their re-emission independently, and 
then connect them by assuming that the total energy 
absorbed in the UV-optical equals to the total e nergy 
re-emitted in the MIR and FIR (iDevriendt et all 119991 : 
Ida Gunha et"aI1 [20081: INoU et all 120091) . A mo7e self- 
consistent treatment requires detailed radiative transfer 
(RT) calculations to be performed using the ray-tracing 
jSilva et al. 1998; Efstathiou ct al. 2000; Gr anato et al.l 
[2000t iTuffs et al.l 120041 : ISiebenmorgen fc Kriigell 120071 : 
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Groves et al.l l2008D. or Monte-Carlo in et hod (iBaes et alj 
20031 iDuU emond & Dominikl 12004 iJonssonI l200a 
Chakrabarti fc WhitnevI 120091 ) . However, these RT 



calculations are commonly computationally expensive. 

Finally, if a powerful active galactic nucleus (AGN) 
is presented in the center of a galaxy, the resulting 
SED would be largely modified. AGN can contribute to 
all wavelength regimes of the electromagnetic spectrum, 
with accretion-disk+corona to the X-ray-UV-optical, 
torus to the IR, and jet to the radio and gamma-ray 
in some cases. In quasars, the AGN light dominates over 
the integrated galaxy light at almost any wavelength, 
while for AGNs with lower luminosities, the contribution 
of the host galaxy may dominate in many wavelengths. 
The modeling of the SEDs of various components of AGN 
have been developed independently, and all needs a spe- 
cial suite of parameters. Meanwhile, the high-energy ra- 
diations from the center AGN can also be absorbed by 
the ISM in the host galaxy and re-emitted in the IR. So, 
if AGN is presented, the relative geometry of starburst, 
ISM, and AGN is important for modeling the SEDs of 
such dust-obscured starburst- AGN composite systems. 
Furthermore, if violent starburst and AGN activities are 
coupled and ongoing together, it may not be reasonable 
to model the SEDs of such systems by a simple linear 
combination of models for starburst and AGN developed 
independently. 

Overall, the SEDs of dust-obscured starburst-AGN 
composite systems are very complicated. A completely 
self-consistent SED model for such complicated systems 
must be very hard to be constructed. To make progress, 
parameterizations of all possible components, their rel- 
ative geometry and possible physical relations are in- 
evitable. Given the complexities mentioned above, it 
is natural to expect a large number of parameters, and 
many possible degeneracies between them. Meanwhile, 
since the effects of dust attenuation, line emission, and 
dust emission have to be taken into account, the prob- 
lem of determining the physical parameters from fitting 
model to observations is highly nonlinear. 

The SED fitting methods have been improved sig- 
nificantly in the last decade, which allow us to ex- 
tract much more complex infor mation imprinted in the 
SEDs (see iWalcher et al.l 120111 for a recent review of 
this field). Among the numerous SED fitting method, 
we believe the metho d based on Bayesia n inference 
dBe nitcz 2000; Kauffmann et al. , 2003 : Fe l dmann et al.l 
l2006i: ISalinTet all l2007t IBailer-.Tonegi l20lH) . in which 
multi- wavelength SEDs are fitted by firstly precomputing 
a library of model SEDs with varying degrees of complex- 
ity and afterwards determining the model and/or model 
parameters that best fit the data, is the best choice for 
the problem we are facing. This method gives us de- 
tailed probability distributions of model parameters and 
the Bayesian evidence as a quantitative evaluation of the 
entire model given the data. 

However, the Bayesian approach requires an inten- 
sive sampling of the posterior distribution, which is a 
function of all parameters, and resulted from combin- 
ing all priori knowledges about the parameters of the 
model and the new information introduced by the obser- 
vations. So, if the model used to explain the observations 
is itself computationally expensive, the sampling must 
be very time consuming. Furthermore, as mentioned 



above, the model SEDs are commonly precomputed as a 
library, rather than computed during the sampling. For 
this reason, an interpolation method must be employed. 
Since the dependences of parameters with the resulting 
SED are highly non-linear and the number of param- 
eters is very large, common interpolation methods are 
not very suitable. This problem can be solv ed more eas- 
ily w it h art ificial neural n etwor k s (ANN) (iLahav et al.l 
1991 [Bert in & Ai^ ^Imiti [19961: lAndreon et al.l 120001 
Firth et al. 2003: CoUister & Lahav"2004': VanzellaeLai] 
2004; Carballo ct al. 2008; Auld ct al. 2008). An ANN 
can be trained to approximate a library of model SEDs 
with highly non-linear complexities, and allow the pa- 
rameter space of the model to be explored more continu- 
ously. On the other hand, a principal component analy- 
sis (PGA) ( Francis ct al. 1992; Glazcbrook et al. 1991 
'Wil d fc HewettI l2005t IWild et al.l 120071 : IBudavari et ail 
2009) can be applied to the library of model SEDs in 
advance to simplify the required structure of the ANN. 

These methods have been demonstra ted nicely by 
lAsensio Ramos fc Ramos Almeidal ()2009D . who apply it 
to a recently developed public database of clumpy dusty 
torus mode l (jNenko va ct al. 2008a, b). Almeida et aL| 
(|2010( ) and iSilva et alJ (201 If ) implemented ANN into 
the RT code GRASIL to speedup the computation 
of the SED of galaxies when applied to semi-analytic 
models (SAMs: 'Wh Ke fc ReeslfTgTSi: ILacev fc Silklll991l: 
White & Frcnk 199f~ The core of these methods are ac- 
tually very general and can be applied to any problem 
regarding fitting precomputed libraries of model SEDs to 
observations. However, these methods are commonly im- 
plemented specifically for a special problem, and not con- 
venient to be used in other similar problems. Inspired by 
these works, we have built a suite of general purpose pro- 
grams to generalize these methods such that they can be 
integrated together to do the Bayesian analysis of SEDs 
by comparing whatever pre-existing model SED libraries 
or their linear combinations with observations. 

This paper is structured as follows. In fj2]we describe 
the PGA and ANN methods used to boost the genera- 
tion of model SEDs, and our implements of them. In 
i|3]we present our Bayesian inference tool, BayeSED. We 
begin in ij3.1l bv a general introduction to Bayesian infer- 
ence methods. In ^'d.2[ we discuss the posterior sampling 
methods. Then, the construction of BayeSED is pre- 
sented in In jS l wc apply this tool to the HLIRGs 
sample of iRuiz e t^l. (2007). Finally, a summary of this 
paper is presented in ij5l 

2. GENERATION OF MODEL SEDS 

2.1. Principal Component Analysis of Model SED 
Libraries 

A SED can be described by a vector of N fiux densities 
[III h, - ■ ■ Jn) at wavelengths (Ai, A2, • • • , Aat). How- 
ever, when the flux at a given wavelength is changed, the 
fluxes at surrounding wavelength points are also chang- 
ing in a very similar way due to the continuity of the 
SED. So, the fluxes at different wavelengths are not com- 
pletely independent and the actual dimension of the SED 
is much less than N . This simple fact make it possible 
to apply some dimensionality reduction techniques to ef- 
flciently compress the representation of a SED. 

One such technique is called principal component anal- 
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ysis (PGA). It can be used to derive an optimal set of 
linear components, called principal components, by diag- 
onalizing the covariance matrix of a set of SEDs to find 
the directions of greatest variation. Then, the original 
data can be approximated by a linear combination of first 
N' ^ N independent principal components. It is worth 
noting that PGA performs a linear analysis. So, if the 
dependences of parameters with corresponding SED is 
non-linear, the number of necessary eigenvectors is com- 
monly larger than the number of physical parameters of 
the model. 

We adopt an IDL package for PGA 0, which gathers 
together several algorithms for PGA into a single pack- 
age and all with the same usage. The Singular Value 
Decomposition (SVD) rather than the Robu st and Itera- 
tive a lgorithm provided in this package tBu davari et al.l 
120091 ) is used. The later two algorithms, when applied to 
the observed SEDs, can be used to obtain eigenvectors 
with more clear physical meanings. However, we apply 
PGA to model SEDs to reduce them in a purely mathe- 
matical sense. Meanwhile, SVD algorithm is much faster 
and good enough for our purpose. 

We have applied PGA to two widely used model SED 
libraries: SBgrid for star bur st galaxies and ULIRGs 
(|Siebenmorgen fc Krtigell 120071) and GLUMPY for AGN 
clumpv torus (|Nenkova et~ 1 l2008illbD . The model SEDs 
in the SBgrid library have 5 parameters: nuclear radius 
i?, total luminosity itotj ratio of the luminosity of OB 
stars with hot spots to the total luminosity /ob, the 
visual extinction from the edge to the center of the nu- 
cleus Ay, and the dust density in the hot spots n^s (see 
iSiebenmorgen &: Kriigelll2007l for detailed explanations 
about these parameters) . On the other hand, the model 
SEDs in the CLUMPY library have 6 parameters about 
the dusty and clumpy torus: the ratio of the outer to the 
inner radii of the toroidal distribution Y = Ro/Rd, the 
optical depth of clumps ry, the number of clouds along 
a radial equatorial path N, the power of the power law 
(r~'^) describing radial density profile q, the width pa- 
rameter charactering the angular distribution cr, and the 
viewing angle measured from the torus polar axis i. 

However, it is very necessary to do some normalizations 
to the libraries before doing the PGA of them. Firstly, we 
find that the PGA will have better performance if we use 
the logarithm of fiux. Secondly, the mean spectra of a 
SED library is found and removed from every SED in the 
library in advance. These normalizations will impact on 
the resulting eigenvectors. The different physical mecha- 
nisms associated with each eigenvector is not important 
for us, and we have not tried to find them out. We only 
treat the PGA eigenvectors as a set of purely mathemati- 
cal basis that allows us to efficiently reconstruct all SEDs 
in a library. 

The first 16 eigenvectors that we have obtained for the 
two libraries are shown in Figure [TJ This figure shows 
that low-order eigenvectors, which have larger variations 
in amplitude for different SEDs, are much smoother than 
the high-order eigenvectors, which have smaller varia- 
tions in amplitude for different SEDs. The low-order 
eigenvectors determine the general shape of a SED while 
the high-order eigenvectors give some more details of the 
SED. Then, any SEDi, in the original library can be ap- 

* |http : //www ■ roe ■ ac . uk/ - vw/ vwpca ■ tar ■ gz | 



proximately reconstructed from a linear combination of 
these eigenvectors : 



SEDi 
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Since the SEDs in the library have been normalized in ad- 
vance, the corresponding inversions are needed after this. 
An example of SED-Reconstruction is shown in Figure 
S] of the next subsect ion to be compared with that ob- 
tained from ANN. As lAsensio Ramos &: Ramos Almeidal 
(2009), we found that 16 PGs are enough to obtain an ac- 
ceptable reconstruction for model SED libraries as com- 
plex as GLUMPY and SBgrid. Now, each SED of the 
two libraries can be represented by a vector of 16 coeffi- 
cients corresponding to 16 principal components (PGs), 
rather than a vector of 124 fiuxes for GLUMPY library or 
a vector of 318 fluxes for SBgrid library. It is clear that 
with the help of PGA the size of a model SED library 
can be greatly reduced. 

2.2. Implement of Artificial Neural Networks 

ANNs are mathematical constructs designed to sim- 
ulate some intellectual behaviors of the human brain. 
For example, it can 'learn' relations between some in- 
puts and outputs by training with many living exam- 
ples. After that, it can be used to predict the out- 
puts from a new set of inputs. Nowadays, ANNs 
have been used successfully in a wi de range of prob- 



lems m cosmology ana astropnysics (iL/anav et ai.Mitjijbt 
Bertin &: Arnoutslll996l:IAndreon et al ."2000': 'Firth et al.l 
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lems in cosmology and astrophysics (iLahav et al.l 119961: 

et al 

„ ■ TVai-_ 

Garb allo et al.l 120081: lAuld et al.l [2008 ). Here, we use 
ANNs to learn the relations between parameters and the 
resulting SEDs for libraries of model SEDs. After train- 
ing, an ANN can be used as a substitute to the model 
SED library which is used to train it, and even inter- 
polate the library to obtain the SED for values of the 
parameters not present in the original grid. 

There are different implements of ANNs which dif- 
fer in the neurons (nodes) organization and informa- 
tion exchanging methods. We have modified ANNz, a 
widely used tool for estimating photometric redshifts us- 
ing ANN, to be a more convenient and general purpose 
ANN code without changing the technical implement of 
ANN. The type of ANN implemented in ANNz is so- 
called multi-layer perceptron (MLP) feed-forward net- 
work. A MLP network consists of a number of layers of 
nodes with the first layer containing the inputs, the final 
layer containing the outputs, and one or more intervening 
(or "hidden") layers. In a feed-forward network, which 
is the most widely used due to its simplicity, information 
propagate sequentially from the input layer, through the 
hidden layers to the output neurons, without any feed- 
back. The network architecture of such an ANN can be 
denoted by iVin:A^i:A^2: • • ■ :A^out where TVin is the number 
of input nodes, iY; is the number of nodes in ith hidden 
layer, and A^out is the number of output nodes. 

In Figure [51 we show the network architectures of 
ANNs used for SBgrid and GLUMPY libraries. The in- 
puts of an ANN are the parameters of the library used 
to train it. So, the ANN for SBgrid library has 5 in- 
puts while the ANN for GLUMPY library has 6 inputs. 
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Fig. 1. — First 16 eigenvectors obtained from PCA of the SBgrid 




Fig. 2. — The network architecture of ANNs for SBgrid (5 input 
parameters) and CLUMPY (6 input parameters) library. In both 
cases, a hidden layer with 20 nodes is used. The outputs of each 
ANN are the coefficients corresponding to the first 16 PCs. 

The capabihty of an ANN is determined by the struc- 
ture of its hidden layers. In mathematics, the universal 
approximation theo rem (|Cvbenko 1989; Kurt & Hornik 
119911 : IHavkinl [1999) states that a standard multilayer 
feed-forward network with only a single hidden layer and 
arbitrary continuous, bounded and nonconstant activa- 
tion function can approximate any continuous function to 
arbitrary precision, provided only that sufficiently many 
hidden units are available. More nodes in a single hidden 
layer or even more hidden layers can increase the degree 
of approximation, but with the expense of much more 
training time. In practice, we found that a single hidden 
layer with 20 nodes is enough for the two libraries. The 
outputs of ANNs are set to be the projections of a SED 
on the first 16 PCs (eigenvector). So, the structure of 
ANNs for SBgrid and CLUMPY library can be denoted 
as 5:20:16 and 6:20:16, respectively. 

An ANN "learn" the relationship between inputs and 
outputs from examples (pairs of inputs and correspond- 
ing outputs). In our case, the examples are model SEDs 
of a library whose parameters and corresponding projec- 
tions have already been known. When a set of inputs 
is given, the ANN "learn" the relationship by adapting 
weights associated with connections between nodes so as 
to minimize the cost function, which represents the dif- 
ference between the prediction of ANN and the expected 
outputs. An iterative quasi-Newtonian method is used 




0.1 1 10 100 0.1 1 10 100 0.1 1 10 100 0.1 1 10 100 
Wavelength (nm) 



(left) and CLUMPY (right) model SED libraries. 

in ANNz to perform this minimization. Meanwhile, an 
activation function, which is taken to be a sigmoid func- 
tion in ANNz, is defined at each node to simulate the 
behavior of biological neurons. This defines the signal 
propagation rule of an ANN in the sense that a neuron is 
activated, which means it transmits the received signal 
further on, when the total of received signals is greater 
than a certain threshold. 

To avoid overfitting to the training set and optimize the 
generalization performance of the network, the SEDs in 
a library are separated into two sets: a training set and a 
validation set. Both of them are randomly selected from 
the library. For the SBgrid library^, the training set con- 
tains 6,495 (90%) SEDs while the validation set contains 
721 (10%) SEDs. The CLUMPY library currently con- 
tains about 1307980 SEDs. Ahhough all of these SEDs 
can be used to train the ANN, we found that this is not 
necessary. So, we have randomly selected about 10% 
SEDs of the CLUMPY library, which contains 130,800 
SEDs. Then, 117,720 (90%) of them are used as train- 
ing set while the other 13,080 (10%) are used as valida- 
tion set. On the other hand, the ANN usually converges 
to different local minima of the cost function, depend- 
ing on the particular initialization. So, for each library, 
a group of 4 networks (called a "committee") with the 
same structure but different initialization are trained in- 
dependently, and the mean of the individual outputs of 
the 4 networks are used as a more accurate estimate for 
the outputs. 

In Figure [31 the projections on the first 4 PCs for the 
SEDs in the validation set from ANN are compared with 
that directly from PCA of the libraries. As clearly shown, 
the projections can be reliably predicted from ANN. For 
both libraries, the rms error CTrms of the predicted projec- 
tions are very small. So, it is reasonable to expect that 
the SEDs in the libraries can be reliably reconstructed 
by using these projections as coefficients for the linear 
combination of PCs. In Figure HI examples of SED- 
reconstruction by using projections as coefficients for the 
linear combination of PCs are shown. It is clear that the 
SEDs in the two libraries can be reliably reconstructed by 
using the projections directly from PCA of the libraries 

^ Four SEDs in this library have been excluded, since they be- 
come discontinuous below about 1 fim. 
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Fig. 3. — Left: The projections on the first 4 PCs for the 721 SEDs (validation set) randomly selected from the SBgrid library are predicted 
from ANN and then compared with that obtained directly from PCA. Right: As left, but for 13080 SEDs (validation set) randomly selected 
from the CLUMPY library. In both cases, the projections can be predicted from ANN with very small (Tmis. So, the SEDs in the libraries 
can be reliably reconstructed from PCs by using these projections as coefficients for the linear combination of PCs. 

or that predicted from ANN. 



3. BAYESED: A TOOL FOR BAYESIAN ANALYSIS OF SED 

3.1. Bayesian Inference 

Bayesian methods have already b een widely us ed in as- 
trophysics and cosmology (see, e.g.. lTrottall2008l for a re- 
view). They have the advantage of higher efficiency and 
of a more consistent conceptual basis for dealing with 
problems of induction in the presence of uncertainties 
than traditional statistical tools. Bayesian methods are 
basically divided into two categories: parameter estima- 
tion and model comparison. The basis of these methods 
is the so-called Bayes' Theorem which states that 



P(J\M) 



(2) 



where ~^ represents a vector of parameters, represents 
a vector of data sets, M represents a model under con- 
sideration. 

The left side of the Equation PCdTd, M) is called 
the posterior probability of parameter u given data "cf 
and model M . It is proportional to the sampling distribu- 
tion of the data Pi , M) assuming the model is true, 

and the prior probability of the model, P{^\M) ("the 
prior"), which describes knowledges about the parame- 
ters acquired before seeing (or irrespective of) the data. 
The sampling distribution describes how the degree of 
plausibility of the parameter 6 changes when new data 
is acquired. It is called the likelihood function when 
being considered as a function of the parameter and 
often written as L (t) =P{t\t,M). 

The posterior probability density function (PDF) for 
one parameter is obtained by marginalizing out (inte- 
grate out) other parameters from the full posterior dis- 
tribution: 

Pi9,\~t,M) = J AOi - ■ ■de.-ide.+i - ■ ■d9N,,,P{t\t ,M). 

(3) 

The normalization constant P{a\M), is called the 
marginal likelihood (or "Bayesian evidence"), which is 
not important for parameter estimation but critical for 



model comparison, and given by 

P{^\M) EE P{t\t, M)P{f\M), 



(4) 



where the sum runs over all the possible choices of the 
parameter ~^ . For a continuous parameter space ^m, 
this can be rewritten as: 

P{t\M)= f PCi\t,M)P{f\M)df. (5) 

In the case of SED fitting, represents the observed 
SED of a galaxy while represents the parameters of 
a model SED library. Commonly, M represents a SED 
library as a whole. However, multiple independent SED 
components (e.g., a starburst component and a AGN 
component) are needed in many cases. Then, different 
combinations of independent SED components should be 
considered as different models. All parameters of sub- 
models are combined together to be a new vector of pa- 
rameters 1^ . For libraries giving relative flux, a free scal- 
ing factor needs to be considered as an additional param- 
eter in the new 



3.2. Posterior Sampling Methods 

A key step in the Bayesian inference problem outlined 
above is the evaluation of the posterior of Equation ([2]) , 
where accurate analytical solutions are commonly not 
easy to obtain or just do not exist. As a consequence, 
some efficient and robust sampling techniques have been 
developed. A widely used technique is called Markov 
Chain Monte Carlo (MCMC). An MCMC sampler, which 
is often based on the standard Metropolis-Hastings algo- 
rithm, provides such a way to explore the posterior distri- 
bution that the number density of samples is asymptoti- 
cally converged to be proportional to the joint posterior 
PDF of all parameters. So, it allows one to map out 
numerically the posterior distribution even in the case 
where the parameter space has hundreds of dimensions 
and the posterior is multimodal and with a complicated 
structure. 

However, such methods can be very computationally 
intensive when the posterior distribution is multimodal 
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Fig. 4. — Left: Examples of model SED (red points) of SBgrid (top) and CLUMPY (bottom) library compared with that directly 
reconstructed by using the projections on the first 16 PCs (blue dash line) obtained by PCA of the library, and that reconstructed by 
using the projections predicted by ANN for the same set of parameters (black solid line). Right: The projections on the first 16 PCs of the 
model SED directly from PCA of the libraries (blue points) compared with that from ANN for the same set of parameters (black points 
with error bar and connected with black line). 

or with large degeneracies between parameters, particu- 
larly in high dimensions. On the other hand, the calcu- 
lation of Bayesian evidence, which is the key ingredient 
for Bayesian model comparison, is extremely computa- 
tionally intensive by using MCMC techniques. Another 
Monte Carlo method called Nested sainpling (jSkillinel 
[200l iMukheriee eTaI|[2Q0l iShaw et all [20071 ) provides 
a more efficient method for the calculation of Bayesian 
evidence, but also produces posterior inferences as a by- 
product. Here, we adopt a newly developed, highly effi- 
cient and free ly available Bayesian inference tool , calle d 
MultiNest (|Feroz fc HobsonI l2008t iFeroz et al.ll2009D . 
It is as efficient as standard MCMC methods for Bayesian 
parameter estimation, more efficient for very accurate 
evaluation of Bayesian evidences for model comparison, 
and fully parallelized. 

3.3. Building-up of BayeSED 

The general structure of our Bayesian inference tool for 
the analysis of SED, BayeSED, is shown in Figure[5l The 
core of BayeSED is the sampling of posterior probability 
with a MCMC or MultiNest sampler. This is shown as 
a loop in the figure. During the sampling, the sampler 
provides proposal parameter vectors for the ANN, and 
the ANN predicts the coefficients for the reconstruction 
of model SEDs using the proposed parameter set. After a 
training with some model SEDs in a library, an ANN can 
help to generate the model SED of any parameter vector 
within the parameter space covered by the library used 
to trained it. Here, it is allowed to simultaneously use 




Fig. 5. — A simple flowchart for the Bayesian analysis of SED 
boosted by PCA and ANN. 

multiple ANNs, which are trained with different model 
SED libraries. The comparison of model with observa- 
tions gives a xi^) ■ Then, the likelihood function is 

given by L{SEDo^^\t , M) = e-^^"^) Z^. 

The priors represent our knowledges about the param- 
eters of the model that are independent of current ob- 
servations. If we have no prior knowledge about the 
model parameters, the prior distributions are commonly 
assumed to be uniform between two physically chosen 
bounds. When the sampling is converged, the posterior 
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PDF for all parameters and the Bayesian evidence for 
the model are obtained. Then, a posterior distribution 
differing from a uniform distribution would imply that 
new information about the corresponding parameter are 
obtained from observations. On the other hand, the ratio 
of evidences for two models, the so-called "Bayes factor" , 
tells us how their relative plausibility should be changed 
as suggested by the new observations. 

4. APPLICATION TO A SAMPLE OF HYPERLUMINOUS 
INFRARED GALAXIES 

4.1. The HLIRG sample and data 

The sample s tudied h ere is the one selecte d by 
iRuiz et al.l (|2007[ ) from the iRowan-RobinsonI ()2000[ ) sam- 
ple of 45 HLIRGs. The sample is limited to sources with 
available X-ray data and with redshift less than 2 
to avoid strong biasing towards high redshift quasars. 
Consequen t ly, the final sample contains thirteen objects. 
I Ruiz et all (|201Clf ) have built multi-wavelength (from ra- 
dio to X-rays) SEDs for these HLIRGs. They fitted stan- 
dard empirical AGN and starburst templates to these 
SEDs and classified the HLIRGs into two groups, named 
class A and class B, according to their different SED 
shapes. These authors also suggested that their simple 
template-fitting approach should be complemented with 
other theoretical models of starburst and AGN emission. 

Here, we present a re-analysis of the SEDs of these 
HLIRGs by using different RT models of starburst and 
AGN emission, and put it on a solid statistical basis. 
The redshifts and observed SED s of these galaxie s are 
taken from the Table 1 and B of lRuiz et al.l ()201C1[) . re- 
spectively. The SEDs have been converted to monochro- 
matic flux density, corrected for the Galactic reddening 
and blue-shifted to rest-frame. Before comparing with 
model SEDs, we convert the monochromatic flux den- 
sity to monochromatic luminosity by using the luminos- 
ity distance ^^(z). 

4.2. Bayesian analysis of SEDs 
4.2. L Models and priors 

Three different models are employed to do Bayesian 
analysis of the SEDs of these HLIRGs. The first is the 
pure starburst model of iSiebenmorgen fc Kriigell (|2007f ) 
as presented in the SBgrid library (noted as 'SB' here- 
after). The priors for the 5 parameters of this model 
are assumed to be uniform distributions truncated to 
the following intervals: R = [0.35, 15] kpc, /ob = 
[0.4,1], log(Ltot/L0) = [10,14.7] , = [2.2,144], 

log(nhs/cm-3) = [2,4]. The SEDs in the SBgrid li- 
brary are in unit of absolute fiux at a distance of 
50Mpc. The absolute flux values have been multiplied 
by Att * 50Mpc * 50Mpc in advance to convert to absolute 
luminosity values. 

The second is the pure AGN model of iNenkova et al.l 
()2008aD as presented in the CLUMPY library (noted 
as 'AGN' hereafter). The priors for the 6 parameters 
of this model are also assumed to be uniform distribu- 
tions truncated to the following intervals: a — [15,75], 
Y = [5,200], N = [1,24], q = [0,4.5], ry = [5,200], 
i = [0, 90]. Since the SEDs in the CLUMPY library have 
been normalized, an additional scaling factor needs to 
be considered as a new parameter. The prior for this 
parameter is assumed to be uniform in the log space: 



TABLE 1 

The Bayesian evidences of the 'SB', 'AGN', a nd 'SB-I-AGN ' 

MODELS FOR GALA XIES IN THE CLASS A AND B Of IRuIZ FT AL.I 

II2007I) HLIRGs sample. 



Source In(evsB) In(eVAGN) ln(evsB+AGN) 
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log(scalcAGN/crg s ^) = [44,50]. 

Finally, the linear combination of the pure starburst 
and pure AGN models is considered as an additional 
new model (noted as 'SB-I-AGN' hereafter). The as- 
sumed priors are the same ones as above. As discussed 
in H2.'2l the model SEDs in the two libraries have been 
used to train two groups of ANNs, respectively. The 
trained ANNs are used as substitutions of the original 
models, and the models can be evaluated continuously 
in the whole parameter space. Since both of the star- 
burst and AGN model used here are not extended to the 
X-ray range, in this paper we mainly focus on the IR 
range (i.e. 1 - 1000 ^m) of the SEDs. The X-ray data 
for galax ies in the H LIRGs sample have also been pro- 
vided bv' Ruiz et al.l (|2010). However, it is very hard to 
construct a self consistent SED model that is able to re- 
produce the whole SED covering such a wide range of 
wavelengths. 

4.2.2. Model comparison 

The Bayesian evidence represents a practical imple- 
mentation of the Occams razor principle. So, a com- 
plex model with more parameters has a lower Bayesian 
evidence unless it provides a significantly better fitting 
to the observations. As mentioned above, in this pa- 
per we consider three different models: 'SB', 'AGN', and 
'SB-f-AGN'. They have 5, 7 and 12 parameters, respec- 
tively. In Table [1] we present the Bayesian evidences of 
the three mo dels for HLIRGs i n the class A and class B 
as defined bv ^ Ruiz "eFall ((20T0I) . Since the Bayesian evi- 
dences for different models spread in a very wide range, 
we use ln(ev,nodci) instead of the evidence itself. 

As shown in Table [3 the 'SB-^AGN' model has the 
highest Bayesian evidence for all galaxies in the HLIRGs 
sample, although it has the largest number of parame- 
ters. So, 'SB+AGN' model provides a much better fitting 
to all of these HLIRGs, which means starburst and AGN 
activities are probably ongoing together in these galax- 
ies. On the other hand, for most class A HLIRGs the 
pure 'AGN' model has a higher Bayesian evidence than 
a pure 'SB', while for most class B HLIRGs the pure 
'SB' model has a higher Bayesian evidence than a pure 
'AGN' model. These results imply that class A HLIRGs 
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are dominated by AGN while class B HLIRGs are domi- 
nated by starburst, although both of starburst and AGN 
are present in all cases. 

4.2.3. Parameter estimation 

In Figure [S] and [3 we show the best fit i.e. the maxi- 
mum a posteriori (MAP) model SEDs that are found dur- 
ing sampling parameter space of the 'SB-I-AGN' model, 
which has the highest Bayesian evidence among the three 
models considered. Commonly, the values of parameters 
corresponding to these best models are taken as the best 
estimation of parameters. However, the Bayesian analy- 
sis method has the advantage of providing the detailed 
posterior distributions for all parameters, which repre- 
sent our full knowledge about these parameters given the 
priors and new observations. The best expectations and 
uncertainties about all parameters can be deduced from 
these possibility distributions. 

In Figure |S1 we show the posterior PDFs of all pa- 
rameters of the 'SB+AGN' model for IRAS18216+6418. 
Given the very limited observations and the large num- 
ber of parameters, it is clear that not all of the parame- 
ters can be well constrained. From the detailed posterior 
PDFs of parameters, it is much easy to find out if a pa- 
rameter is well constrained or not. For example, the basic 
parameters _R, Ltot of starburst component and cr, q of 
AGN component are well constrained. The derived pa- 
rameters log(LgB), log(L5^j^) and log(L!j!Qrp) are nicely 
constrained. 

Apparently, for a large number of galaxies in a sample, 
it is not possible to plot such kind of PDFs for all of them. 
It would be more convenient to use a summary statistics 
to give a good estimate for a parameter and its spread. 
Here, we use the median and percentile statistics. The 
median are found by firstly sorting all values in ascending 
order, then taking the element in the middle so that half 
of all points are below the median and the other half 
above it. The lower and upper quartiles are the values 
below which 25 and 75 percent of points fall, respectively. 
This statistics is much better than the mostly used mean 
and standard deviation statistics when the distribution 
of PDF is asymmetrically skewed or multimodal. 

4.2.4. Relations between starburst and AGN parameters 

In Table [2] and [3l we present the estimated starburst 
and AGN parameters for class A and B HLIRGs by em- 
ploying the 'SB-f- AGN' model. With the estimated star- 
burst and AGN parameters of these HLIRGs, it would 
be interesting to explore some possible relations between 
these parameters, especially those between starburst and 
AGN. However, with the very limited observations not 
all parameters can be well constrained. In Figure [9l we 
present some relations between the starburst and AGN 
paramet ers th at are relatively better constrained. 

Figure 9(a) show the IR luminosity of starburst and 
AGN for all HLIRGs in the sample. As shown clearly, 
the IR luminosity of most class A HLIRGs are dominated 
by AGN, while the IR luminosity of class B HLIRGs are 
dominated by starburst. This is consistent with the con- 
clusions drawn accordi ng to the B a yesian evidences as 
shown in Section [4.2.21 iRuiz et all ( 20100 classified the 
HLIRGs in their sample into class A and B according to 
the shape of their SEDs. So, our results show that the 



class A and B HLIRGs essentially differ in their domi- 
nating emission source. 

Figure 9(b) show the relation between the IR luminos- 
ity of AGIN and the fraction of OB star in the starburst 
region. The figure shows an anti-correlation between the 
fraction of OB star in the starburst region and the IR 
luminosity of AGN in the center. This implies that the 
starburst in class B HLIRG is younger than that in class 
A HLIRG. On the other hand. Figure 9(c) also show an 
anti-correlation between the optical depth of clumps in 
AGN torus and the fraction of OB star in the starburst 
region. This may imply that the AGN torus in class A 
HLIRGs are more dusty than those in cl ass B HLIRGs. 
Furthermore, the results in Figure |9(d)| show that the 
starburst region in class B HLIRG seems more compact 
than that in class A HLIRG. 

5. SUMMARY 

Dust-obscured starburst- AGN composite galaxies, 
such as ULIRGs and HLIRGs, represent important 
phases in the formation and evolution of galaxies. It is 
still very challenging to understand the nature of these 
interesting but complex galaxies from their SEDs. This 
can be achieved from the interplay between modeling and 
fitting of their SEDs. However, a self-consistent multi- 
wavelength SED model for such complex systems must 
contain many parameters, and can only be established 
step by step. So, a flexible, efficient and robust SED fit- 
ting tool is very necessary. In light of these, we developed 
a suite of general purpose programs, called BayeSED, for 
doing Bayesian analysis of SEDs. The PGA and ANN 
techniques are employed to allow an accurate and effi- 
cient generation of model SEDs. Meanwhile, the state- 
of-art Bayesian inference tool, MultiNest, is interfaced 
with ANN to allow a highly efficient sampling of posterior 
distributions and the calculation of Bayesian evidence. 

As a demonstration, we apply this code to a HLIRG 
sample. By employing three models, we present a com- 
plete Bayesian analysis of their SEDs, including model 
comparison and parameter estimation. According to 
the computed Bayesian evidence of different models and 
the estimated IR luminosity of starburst and AGN, we 
found that the c lass A and B HLIRG as defined by 
I Ruiz et all (|2010f ) essentially differ in their dominating 
emission source. On the other hand, we found some re- 
lations between the estimated starburst and AGN pa- 
rameters. For example, the AGN torus of the HLIRGs 
dominated by AGN tend to be more dusty than that of 
HLIRGs dominated by starburst. The starburst region 
of the HLIRGs dominated by starburst tends to be more 
compact and has a higher fraction of OB star than that 
of HLIRGs dominated by AGN. 

These results are understandable in the context 
of galaxy merger d riving starburst and delayed 



1988 



2005 



AGN activity ( Gcnzcl et al.Hl998t ISanders et all 
iKauffmann & Hachnclt '20 001: iDi Ma tteo et jlT 
Springcl et al. 2005; Hopkins_eLalJ [2006, 200^ There 
may be an evolution path from class B HLIRG to class 
A HLIRG. The class B HLIRG may represent the stage 
where a powerful AGN has already been triggered but 
still not outshine the starburst, while in the state rep- 
resented by class A HLIRG, the powerful AGN in the 
center becomes dominating the output of energy. How- 
ever, the sample studied here is still very small. Further 
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Fig. 6. — Best fit (or MAP) model SEDs for class A HLIRG obtained from sampling the 'SB+AGN' model, which has the highest Bayesian 
evidence among the three models considered. The dotted, dashed, and solid lines represent the starburst component, AGN component and 
total, respectively. 
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studies based on more complete samples of HLIRGs and 
VlieoBfiifig^ rocp%teofSPlMefie4lkgb^#yiillSS hypoth- 
esis. 

Generally, we believe BayeSED can be a useful tool for 
understanding the nature of complex systems, such as 
dust-obscured starburst-AGN composite galaxies, from 
decoding their SEDs. In the future works, we will apply 
this code to other larger samples to explore the interplay 
between starburst and AGN activities in these interest- 
ing galaxies. On the other hand, there is still no well 
established SED models specifically for starburst-AGN 
composite galaxies. So, it would be interesting to explore 
if a self-consistent SED model specifically for composite 
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can have higher Bayesian evidence than a linear 
combination of Starburst -1- AGN models. 
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TABLE 2 

The estimated stahburst parameters and corresponding uncertainties for class a and B HLIRGs by employing the 
'SB+AGN' MODEL. The median and percentile statistics are used to the obtain the best estimation of a parameter and its 

UPPER and lower limits. 
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TABLE 3 

Similar, to Table [2] but for AGN parameters. 
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Fig. 9. — Some relations between starburst and AGN parameters. The red-solid and green-dashed error bars arc results for class A and B 
HLIRG, respectively, (a) The IR luminosity of starburst and AGN for all HLIRGs in the sample. The black-solid line represents position 
where the IR luminosity of starburst and AGN are equal, (b) The relation between the IR luminosity of AGN and the fraction of OB star 
in the starburst region, (c) The relation between the optical depth of clumps in AGN torus and the fraction of OB star in the starburst 
region, (d) The relation between the IR luminosity of AGN and the size of starburst region. 
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